4 research outputs found

    A Multiple Classifier System for Predicting Best-Selling Amazon Products

    Get PDF
    In this work, I examine a dataset of Amazon product metadata and propose a heterogeneous multiple classifier system for the task of identifying best-selling products in multiple categories. This system of classifiers consumes the product description and the featured product image as input and feeds them through binary classifiers of the following types: Convolutional Neural Network, Na¨ıve Bayes, Random Forest, Ridge Regression, and Support Vector Machine. While each individual model is largely successful in identifying best-selling products from non best-selling products and from worst-selling products, the multiple classifier system is shown to be stronger than any individual model in the majority of cases of identifying best-selling products from non best-selling products, and achieves up to 83.3% accuracy, depending on the product category. To my best knowledge, this research is the first application of ensemble learning to Amazon product data of this type and the first use of product images and Convolutional Neural Networks to predict product success

    Lexical Semantic Recognition

    Full text link
    In lexical semantics, full-sentence segmentation and segment labeling of various phenomena are generally treated separately, despite their interdependence. We hypothesize that a unified lexical semantic recognition task is an effective way to encapsulate previously disparate styles of annotation, including multiword expression identification / classification and supersense tagging. Using the STREUSLE corpus, we train a neural CRF sequence tagger and evaluate its performance along various axes of annotation. As the label set generalizes that of previous tasks (PARSEME, DiMSUM), we additionally evaluate how well the model generalizes to those test sets, finding that it approaches or surpasses existing models despite training only on STREUSLE. Our work also establishes baseline models and evaluation metrics for integrated and accurate modeling of lexical semantics, facilitating future work in this area.Comment: 11 pages, 3 figures; to appear at MWE 202

    Making Heads and Tails of Models with Marginal Calibration for Sparse Tagsets

    Full text link
    For interpreting the behavior of a probabilistic model, it is useful to measure a model's calibration--the extent to which it produces reliable confidence scores. We address the open problem of calibration for tagging models with sparse tagsets, and recommend strategies to measure and reduce calibration error (CE) in such models. We show that several post-hoc recalibration techniques all reduce calibration error across the marginal distribution for two existing sequence taggers. Moreover, we propose tag frequency grouping (TFG) as a way to measure calibration error in different frequency bands. Further, recalibrating each group separately promotes a more equitable reduction of calibration error across the tag frequency spectrum
    corecore